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Abstract 

The generalized Kullback-Leibler divergence (K-Ld) in Tsallis statistics [constrained 
by the additive duality of generalized statistics (dual generalized K-Ld)] is here rec- 
onciled with the theory of Bregman divergences for expectations defined by normal 
averages, within a measure-theoretic framework. Specifically, it is demonstrated 
that the dual generalized K-Ld is a scaled Bregman divergence. The Pythagorean 
theorem is derived from the minimum discrimination information-principle using 
the dual generalized K-Ld as the measure of uncertainty, with constraints defined 
by normal averages. The minimization of the dual generalized K-Ld, with normal 
averages constraints, is shown to exhibit distinctly unique features. 
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1 Introduction 



The generalized (also, interchangeably, nonadditive, deformed, or nonexten- 
sive) statistics of Tsallis' has recently been the focus of much attention in sta- 
tistical physics, complex systems, and allied disciplines [1]. Nonadditive statis- 
tics suitably generalizes the extensive, orthodox Boltzmann-Gibbs-Shannon 
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(B-G-S) one. The scope of Tsallis statistics has lately been extended to stud- 
ies of lossy data compression in communication theory [2] and machine learn- 
ing [3,4]. A critical allied concept is that of relative entropy, also known as 
Kullback-Leibler divergence (K-Ld), which constitutes a fundamental distance- 
measure in information theory [5]. The generalized K-Ld [6] encountered in 
deformed statistics has been described by Naudts [7] as a special form of f- 
divergences [8]. A related notion is that of Bregman divergences [9]. These 
are information-geometric tools of great significance in a variety of disciplines 
ranging from lossy data compression and machine learning [10] to statistical 
physics [11]. 

The generalized K-Ld in a Tsallis scenario (see Eq. (6) of this Letter) is not a 
Bregman divergence, which constitutes a serious shortcoming. This is unlike 
the case of the K-Ld in the B-G-S framework, which is indeed a Bregman 
divergence [10]. This forecloses the ability of the generalized K-Ld to extend 
to the case of generalized statistics the bijection-property between exponen- 
tial families of distributions and the K-Ld, and other fundamental properties 
of Bregman divergences, true in the B-G-S framework. The consequence of 
the bijection property is that every regular exponential family corresponds to 
a unique and distinct Bregman divergence (one-to-one mapping), and, there 
exists a regular exponential family corresponding to every choice of Bregman 
divergence (onto mapping). The bijection property has immense utility in ma- 
chine learning, feature extraction, and allied disciplines [10, 12, 13]. 

A recent study [14] has established that the dual generalized K-Ld is a scaled 
Bregman divergence in a discrete setting. Further, Ref. [14] has tacitly put 
forth the necessity of employing within the framework of generalized statis- 
tics the dual generahzed K-Ld (see Eq. (7) of this Letter), a scaled Bregman 
divergence, as the measure of uncertainty in analysis based on the minimum 
discrimination information (minimum cross entropy) principle of KuUback [15] 
and Kullback and Khairat [16]. Scaled Bregman divergences, formally intro- 
duced by Stummer [17] and Stummer and Vajda [18], unify separable Bregman 
divergences [9] and f-divergences [8]. 

At this juncture, introduction of some definitions is in order. 
Definition 1 (Bregman divergences) [9] : Let be a real valued strictly convex 
function defined on the convex set S C dom{(f)), the domain of <p such that 
(j) is differentiable on ri{S), the relative interior of S. The Bregman diver- 
gence : S X n{S) I— 7- [0,oo) is defined as: B^{zi,Z2) = 4>{zi) — 4>(z2) — 
{zi — Z2, V0 (-22)), where: V0 (-22) is the gradient of evaluated at 22- LJ 

Definition 2 (Notations) [18]: J\4 denotes the space of all finite measures on a 
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measurable space {X, A) and V C the subspace of all probability measures. 
Unless otherwise explicitly stated P,R,M are mutually measure-theoretically 
equivalent measures on {X,A) dominated by a a-finite measure A on {X,A). 
Then the densities defined by the Radon-Nikodym derivatives 



dP dR , dM 



(1) 



have a common support which will be identified with X. Unless stated oth- 
erwise, it is assumed that P, R & V, M ^ M. and that : (0, oo) i-^ 7^ is a 
continuous and convex function. 

Definition 3 (Scaled Bregman Divergences) [18] The Bregman divergence 
of probability measures P, R scaled by an arbitrary measure M on {X,A) 
measure-theoretically equivalent with P, R is defined by 



B^{P,R\M) 



X 



- -0 - 



m ml ' \ m. 



dM 
dX. 



(2) 



The convex (f) may be interpreted as the generating function of the divergence. 

Definition 4[19, 20] : Let {X,A) be a measurable space while symbols P,R 
denote probability measures on {X,A). Let p,r > denote w4-measurable 
functions on the finite set X. A ^-measurable function p : X ^ TZ is said 
to be a probability density function (pdf) if J^pdX = 1. In this setting, the 
measure P is induced by p, i.e.. 



P(E) = I pdX-'iE e A. 

Je 



(3) 



Definition 4 provides a principled theoretical basis to seamlessly alternate 
between probability measures and pdf 's as per the convenience of the analysis. 



The generalized K-Ld is defined in the continuous form as [7] 

1 



^-\dX^-'- 

.pj K. 







/ p 




Ix 





dX, 



(4) 



where p is an arbitrary pdf, r is the reference pdf, and n is some nonadditivity 
parameter satisfying: — 1 < k < 1; k 7^ 0. Here, (1) employs the definition of 
the deduced logarithm [7] 



X 



(5) 



Speciahzing the above theory to the case of Tsallis scenario by setting n — q—1 
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yields the usual doubly convex generalized K-Ld [6] 



^K-L 







/ p 









d\. (6) 



Note that the normalization condition is: Jp^pdX = 1. This result is readily 
extended to the continuous case. 

The additive duahty is a fundamental property in generalized statistics [1]. 
One implication of the additive duality is that it permits a deformed logarithm 
defined by a given nonadditivity parameter (say, q) to be inferred from its dual 
deformed logarithm [1,7] parameterized by: q* = 2 — q. Section 4 of this Letter 
highlights an important feature of Tsallis measures of uncertainty subjected 
to the additive duality when performing variational minimization. 

Re-parameterizing (6) by specifying: q 2 — q = q* yields the dual generalized 
K-LcS 



D'K-L{p\\r) = T^I;,P 



d\ 

(7) 

J^pln,. (f) dX = J;,\n,, (f ) dP = (^11 R) ■ 



Proposition 1: Dl^_j^ is jointly convex in the pair (p||q'). Given probability 
mass functions {pg,qi) and ^2,92), then 

D'k.l (Api + (1 - X)P2\\ \qi + {l-\) q2) 
< XD'[._^ (pill gi) + (1 - A) D';_^ (P2II 92) , 

V A G [0, 1]. This result seamlessly extends to the continuous setting. 

An important issue to address concerns the manner in which expectation val- 
ues are computed. Nonextensive statistics has employed a number of forms 
in which expectations may be defined. Prominent among these are the linear 
constraints originally employed by Tsallis [1] (also known as normal averages) 
of the form: (A) = J^PiAi, the Curado-Tsallis (C-T) constraints [21] of the 

i 

form: (A)^ = YlpfAi , and the normalized Tsallis-Mendes-Plastino (TMP) 

i 

constraints [22] (also known as q-averages) of the form: {{A)) = v^S-A IZl 

i 

A fourth constraining procedure is the optimal Lagrange multiplier (OLM) 
approach [23]. Of these four methods to describe expectations, the most com- 
monly employed by Tsallis-practitioners is the TMP-one. 



^ Here "— j." denotes a re-parameterization of the nonadditivity parameter, and is 
not a limit. 

^ In this Letter, < • > denotes an expectation. 
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The originally employed normal averages constraints were abandoned be- 
cause of difficulty in evaluating the partition function, except for very simple 
cases. The C-T constraints were replaced by the TMP constraints because: 
7^ 1. Recent works by Abe [24] suggest that in generalized statistics 
expectations defined in terms of normal averages, in contrast to those de- 
fined by g-averages, are consistent with the generalized H-theorem and the 
generalized Stosszahlansatz (molecular chaos hypothesis). Understandably, a 
re-formulation of the variational perturbation approximations in nonextensive 
statistical physics followed [25] , via an application of g-deformed calculus [26] . 

The minimum K-Ld principle is of fundamental interest in information theory 
and allied disciplines. The nonadditive Pythagorean theorem and triangular 
equality have been studied previously by Dukkipati, et. al. [27,28]. These stud- 
ies were however performed on the basis of minimizing the generalized K-Ld 
using questionable constraints defined by C-T expectations and g-averages. 
The Pythagorean theorem is a fundamental relation in information geometry 
whose form and properties are critically dependant upon the measure of un- 
certainty employed, and, the manner in which expectations (constraints) are 
defined. 

This Letter fundamentally differs from the studies in Refs. [27] and [28] in a 
two-fold manner: {i) the measure of uncertainty is the dual generalized K-Ld 
(a scaled Bregman divergence), and (m) the constraints employed are defined 
by normal average constraints, whose use in generalized statistics has been 
revived by the methodology of Ferri, Martinez, and Plastino [29]. 

At this stage, it is important to interpret the findings in Ref. [24] within 
the context of the equivalence relationships between normal averages, C-T, q- 
averages, and OLM forms of expectations derived in Ref. [29]. First, while Ref. 
[24] has suggested the inadequacy of g-averages on physics-based arguments, 
the equivalence relationships in [29] are purely mathematical in nature. Next, 
[29] provides a mathematical framework to minimize Lagrangians using the 
Tsallis entropy employing normal averages expectations. 

A notable consequence of minimizing the generalized K-Ld or the dual gener- 
alized K-Ld using normal averages constraints is that the expression for the 
posterior probability is self-referential[l\. Specifically, the expression contains 
a function of the posterior probability, which is unknown and to be determined. 
Fundamental differences in deriving the generalized Pythagorean theorem in 
this Letter vis-a-vis the analysis presented in Refs. [27] and [28] lead to results 
which are qualitatively distinct from both an information-geometric as well as 
a statistical-physics perspectives. 

Thus, this Letter establishes the Pythagorean decomposition of the dual gener- 
alized K-Ld (a scaled Bregman divergence) within the framework of deformed 
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statistics for physically tenable normal averages expectations. Such an analysis 
forms the basis to generalize the analysis in [12] for information theoretic co- 
clustering for mutual information based models. By definition, co-clustering 
involves clustering of data that inhabits a, m x n matrix. Co-clustering has 
utility in a number of critical applications such as text clustering [30], bio- 
informatics [31], amongst others. 

Note that for mutual information based models, defining the scaled Bregman 
information as the normal averages expectation of the dual generalized K-Ld 
[14], the Pythagorean theorem derived for the dual generalized K-Ld in this 
Letter provides the foundation to extend the optimality of minimum Bregman 
information principle [12], [32] which has immense utility in machine learn- 
ing and allied disciplines, and, the Bregman projection theorem to the case 
of deformed statistics. Finally, the Pythagorean theorem and the minimum 
dual generalized K-Ld principle developed in this Letter serve as a basis to 
generahze the concept of I-projections [33] to the case of deformed statistics. 

This Introductory Section concludes by establishing the qualitatively distinct 
nature of this Letter: 

• {i)This Letter generalizes and extends the analysis in Ref. [14]- In Ref. [14], 
it was shown that the dual generalized K-Ld is a scaled Bregman diver- 
gence. This was demonstrated in a discrete setting. The generalization is 
accomplished in Section 3 by demonstrating that this property also holds 
true in a continuous setting. This is accomplished by expressing the Radon- 
Nikodym derivatives (1) as Lebesgue integrals (3). Note that in a continuous 
measure-theoretic framework, the relationship ((1) and (3)) between prob- 
ability densities and probability measures is transparent. The extension of 
the generalization of the results derived in Ref. [14] is presented in Sections 
4 and 5 of this Letter. 

Section 4 takes advantage of the seamless relationship between probabil- 
ity densities and probability measures in a continuous setting to perform 
minimization of the dual generalized K-Ld by employing (1) and (3). First, 
the Lagrangian for the minimum dual generalized K-Ld defined by proba- 
bility densities for normal averages expectations (17), which is characterized 
by Lebesgue integrals, is subjected to a straightforward transformation by 
invoking (1) and (3). This step is followed by a simple minimization of 
the transformed Lagrangian with respect to the probability measure, which 
yields the minimum dual generalized K-Ld criterion (25) defined in terms 
of probability densities. 

This minimum dual generalized K-Ld criterion is then employed as the ba- 
sis to derive the Legendre transform relations (26). The Legendre transform 
conditions, in conjunction with the Shore expectation matching condition 
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[34], arc central in deriving the Pythagorean theorem for the dual general- 
ized K-Ld with normal averages constraints (Eq. (40) in Section 5 of this 
Letter). At this stage, it is necessary to explain the tenability of employing 
the Shore expectation matching condition in generalized statistics, given the 
finding in [24] that the Shore- Johnson Axioms [35] (notably Axiom III - sys- 
tem independence) are not applicable in generalized statistics which models 
complex systems whose elements have strong correlations with each other. In 
the Shore expectation matching condition (see Section 5 of this Letter), the 
correlations and interactions between elements are self-consistently incorpo- 
rated into the probability density with which the expectation is evaluated. 
Specifically, the probability density is unambiguously determined during the 
process of minimizing the dual generalized K-Ld, using normal averages con- 
straints. Thus, the Shore expectation matching condition is not adversely 
affected by the inapplicability of the Shore- Johnson Axioms when utilized 
in deformed statistics. 

• (ii) As stated above, the basis for establishing the dual generalized K- 
Ld as a scaled Bregman divergence, and, the subsequent derivation of the 
Pythagorean theorem for normal averages expectations is motivated by ex- 
tending the theory of I-projections [33] to the case of generahzed statistics, 
and the derivation of iterative numerical schemes (such as iterative scaling, 
alternating divergence minimization, and the EM algorithm) based on a 
candidate deformed statistics theory of I-projections [36]. For this, the can- 
didate deformed statistics I-divergence between two probability densities p 
and q is to be strictly convex. 

This is true for the case of the usual K-Ld, the generalized K-Ld, and as 
stated in Proposition 1 of this Section, also holds true for the dual general- 
ized K-Ld. In Ref. [7], a form of a generalized K-Ld which is Bregman di- 
vergences has been derived, and employed with normal averages constraints 
in (for example, Bagci, Arda, and Server [37]). However, it is convex only 
in terms of one variable and is unsuitable to the primary leitmotif of this 
study, i.e. generalizing I-projections and the above stated iterative numer- 
ical schemes [33, 36] to the case of deformed statistics. This form of the 
generalized K-Ld which is a Bregman divergence does appear to have ap- 
plications in other disciplines, as demonstrated by Ref. [37], amongst other 
works. 



2 Theoretical preliminaries 

The essential concepts around which this communication revolves are reviewed 
in the following sub-sections. 
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2.1 Tsallis entropy and the additive duality 



By definition, the Tsallis entropy, is defined in terms of discrete variables as 
[1] 

SAp)^-^^;I^pdX^l. (9) 

The constant q is referred to as the nonadditive parameter. Here, (9) imphes 
that extensive B-G-S statistics is recovered as g — > 1. Taking the limit q — >■ 
1 in (9) and invoking L'Hospital's rule, Sq{p) S (p), i.e., the Shannon 
entropy. Nonextensive statistics is intimately related to q-deformed algebra 
and calculus (see [26] and the references within). The q-deformed logarithm 
and exponential are defined as [26] 



H i^) - 1-q 

and, 



exp^ {x) 



(10) 

[1 + {1 - q) x]~^ ;! + {1 - q) X > 
0; otherwise. 



In this respect, an important relation from q-deformed algebra is the g-deformed 
difference [26] 

WgJ- l+(l-q)x ^^^^ 

^In, (|)=y«-i(ln,x-ln,t/). 
The TsaUis entropy may be written as [1] 

Sq{p)^-J;,pnnqpdX. (12) 



This Letter makes prominent use of the additive duality in nonextensive statis- 
tics. Setting q* = 2 — q, from (11) the dual deformed logarithm and exponential 
are defined as 

lug* (x) = - luq (I) , and, exp^. (x) - ^^^^y (13) 

The dual Tsallis entropy, and, the dual generalized K-Ld may thus be written 
as 

Sq* (P) = - fx P^T^q* pdX, 

and, (14) 

D'';_^[p\\r] = J;,p\n,.i^^)dX, 

respectively. Note that the dual Tsallis entropy acquires a form identical to 
the B-G-S entropies, with lnq*(«) replacing log(«) [2]. 



8 



3 Dual generalized K-Ld as a scaled Bregman divergence 



Theorem 1: Let t = z = G V, and, m being the scaline. For the 

convex generating function of the scaled Bregman divergence: (p{t) = tlnq*t, 
the scaled Bregman divergence acquires the form of the dual generalized K-Ld: 
B^{P,R\M = R)=J;^p\n,* (2) dX. 

Proof: 

From (1) and (2) 



B^{P,R\M] 



Ix 



(a) 



luo* £- — ^ ln„* 



_p r_ 

m m 



m y \m J 



dM 



p\n,, ^ - ping* ^ - {p - r) (^) 



1-5* 



Jx I pm"* 1 [lug* p - lug. r] - (p - r) (^) 



(15) 



dX. 



where (a) implies invoking the g-deformed difference (11) with q* replacing q. 
Setting m = r in the integrand of (15) and re-invoking (11) yields (7) 



(F, R\M = R)= Japing, (2) dX = J^ln^. (f ) dP 



(16) 



This is a g*-deformed f-divergence and is consistent with the theory derived 
in Refs. [17] and [18], when extended to deformed statistics in a continuous 
setting. 



4 Canonical distribution minimizing the Dual Generalized K-Ld 



Consider the Lagrangian 

L {x, a, (3) = JxP{x) lug. dX (x) 

+ Ix\ l^mP (x) Um (x) dX (x) - ) - Q J;^ {p (x) dX (x) - 1) 

\m=l J 

^=V.ln,.(fg)ciP(.) 

/ M \ 
+ Ix\T. f^mUm (x) dP (x) - (m^) - « /;t {dP (x) - 1), 
\m=l J 

where Um,m = 1,...,M are some ^-measurable observables. In the second 
relation in (17), (a) implies invoking (3) and Definition 40 Here, the normal 



^ Note that the second relation in (17) utihzes the relation from (1), (3), and 



Definition 4 (after some abuse of notation): pix) = ^ (x) = ^XT^ 
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average expectations are defined as 

/ p{x)ujn{x)dX{x) =< Ujn >'-irn = 1, M. (18) 

The variational minimization with respect to the probabihty measure P ac- 
quires the form 



(5L(x',Q;,/3) r\ 

^ In,, (gg) + E ^mUm {x)-a = 0. 

^ ' m=l 

Thus ^ 

p (x) = r (x) expq* ( a - 51 /^m^m (a:;) J • (20) 

Thus, the posterior probability minimizing the dual generalized K-Ld is 



r(a;)exp,* ( - E {x)Um{x, 
p (x) = - 



M 



m— 1 



(l+(l-<j*)a)9*-i (21) 

Here, (21) highlights the operational advantage in employing dual Tsallis mea- 
sures of uncertainty, since they readily yield the q* -deformed exponential form 
as a consequence of variational minimization when using normal average con- 
straints. Multiplying (19) by p(a;), integrating with respect to the measure 
A(a:), and invoking (18) and the normalization condition: fp^p[x)d\{x) — 1 
yields 



X 



p (X) In,* dX (X) + J2 f^rn (Urn) = «• (22) 



From (21) and (22), the canonical partition function is 

Z [x, {x)) = (1 + (1 - q*) a)^^ . (23) 

Note that Z (a:,/3^ (x)^ and Pm{x) are to be evaluated Vx e X. This feature 
is exhibited by the variational minimization of generalized K-Ld's and gener- 
alized mutual informations employing normal average constraints [2,3]. Prom 
(23) 

^-Vl .X I- (24) 



Z ^X, /3m {X 

From (21), (22), and (24), it is evident that the form of the posterior proba- 
bility minimizing the dual generalized K-Ld is self-referential [1]. Further, for: 

< 0, the canonical posterior probability in (21): 



l-(l-g*) E /3l{x)ur, 

m=l 
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p{x) = 0. This is known as the Tsallis cut-off condition [1]. Substituting (24) 
into (22) yields the minimum dual generalized K-Ld 



M 



D'k-l b \\r] = In,* - / . .X - E (um). (25) 



^'^ [X) 1 / m=l 



Prom (25), the following Legendre transform relations are obtained 



d(Um) ~ 

^ ln„. ' 1 



z(a;A*(a;)) 



{Ur. 



(26) 



5 Pythagorean theorem for the dual generalized K-Ld 



Theorem 2: Let r{x) be the prior probability distribution, and p{x) be the 
posterior probability distribution that minimizes the dual generalized K-Ld 
subject to a set of constraints 



p (x) Ujn{x)d\ (x) = {um) ; m = 1, M. (27) 
Let l{x) be any other (unknown) distribution satisfying the constraints 

/ l(x)um(x)dX(x) ^ {wm);m^l,...,M. (28) 

Then 

(i) D'k_l \\P] is minimum only if (Shore expectation matching condition) 

{Urr,) = {Wm) • (29) 

{ii) From (29) 

D'Ll \l \\r] = D'j^^, [I \\P] + D'j^-L \P \\r] ^^^^ 
+ {l-q*)D^^_^\p\\r]D^^_^[l\\p]. 

Proof: Taking the difference between the dual generahzed K-Ld's yields 
Di-Al\\r]-D^^_,[l\\p] 
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while multiplying and dividing the integrand of (31) by 1 + (1 — q*) ln^« (^^) 
leads to 



\r] - D],_, [I lb] 



l(x) • 



-ln„ 



■ l(x) 



)] 



i+(i-</*)N*(^) 



1 + 



(l-g*)ln,* (^)]}rfA(x). 



(32) 



Invoking now the definition of the g*-deformed difference from (11) (by re- 
placing q with q*) resuhs in: ''^H^J ViJ^J = ln„. f^). Thus, after re- 

^ ^ ^ ^ ^ l+(l-g*)ln,*(^) « Vr-(a;); 

arranging the terms (32) results in 



D^-L[l\\p]-D''K-L[l\\r] 

- I (^) {K* (fS) [l + (1 - (^)] } dX (x) 



x) \n„ 



r{x) 



(33) 



+ (l-g*)V D^U[/|b]}^A(x). 



At this point we expand (33) and invoke (19), (24), and (28) to arrive at 



D'k^l [I lb] 



M 

+ IxH^) E PmUm{x)dX{x) 
m=l 

($}) Di-L[i\\p] 

= ^^-L [I \\r] - (^) + E {Wm] 

^ ' 171=1 

- (1 - Q*) D'k-l [I lb] ($}) , 



(34) 



where: 7;^. /(x)(iA(x) = 1. Note that: Z{») = Z {x , (3!^ (x)) . Multiplying and 
dividing the fourth term on the RHS of (34) by p(x) and integrating over the 
measure A(a:) yields 



d'k., [I lb] 



- (1 - q*) D'k_, [I lb] 



\ V J / m=l 



(35) 



p{x)d\{x) 
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Now, setting Jj^p{x)d\{x) — 1, (35) acquires the form 

Di-L [/ lb] 

= D'k-l [I \\r] - (^) + E /?m {Wm) (36) 
-(l-g*)D^U [1\\P]D'K-L[p\\r], 
and, with the aid of (25), (36) yields 

Di-L [I lb] = [I lb] 

- (1 - q*) (\n,, (^) - («^) j [/ lb] . 

The minimum dual generalized K-Ld condition is 

dD'^;_, [I lb] 



a/3. 



0. (38) 



This imphcs that the posterior pdf p whose canonical form is given by (21) not 
only minimizes: Dl^_j^ [/ but also minimizes: D'j^_^ [p \\r] as well. Subject- 
ing (37) to (38) and invoking the second Legendre transform relation in (26) 
yields the Shore expectation matching condition [34] for the dual generalized 
K-Ld 

{U„,) = (Wrn) . (39) 

Substituting now (39) into (37) and invoking (25) allows one to write 

Di^L [I lb] = d'k^l [I lb] - d'k-l b lb] ^^Q^ 
-(i-g*)i^^_^blb]^l^-L[nb]- 

The Pythagorean theorem for the dual generalized K-Ld with normal average 
constraints has two distinct regimes, depending upon the range of the dual 
nonadditive parameter 

D';_L [I lb] > D";^^ [I \\p] + Dt:.^ [p ||r] ; g* > 1, 

(41) 

D'k-l [I lb] < D],_^ [I \\p] + Dl_^ \p ||r] ; < g* < 1. 

While Theorem 2 is called the Pythagorean theorem, (30) is referred to as the 
nonadditive triangular equality for the dual generalized K-Ld. It is interesting 
to note that the expectation-matching condition (29) has a form identical to 
the case of the B-G-S model {q* — )■ 1), and differs from that of the Pythagorean 
theorem for the "usual" form of the generalized K-Ld for the case of constraints 
defined by C-T expectations and g-averages, respectively [27, 28]. Also to be 
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noted is the fact that the minimum dual generahzed K-Ld condition (38) is 
guaranteed. This differs from the case of the g-averages constraints derived in 
previous works [27, 28]. This feature is of importance when generalizing the 
minimum Bregman information principle to the case of deformed statistics. 



6 Summciry and conclusions 



This Letter has proven that the dual generalized Kullback-Leibler divergence 
(K-Ld) is a scaled Bregman divergence, within a measure-theoretic framework. 
Also, the Pythagorean theorem for the dual generahzed K-Ld has been estab- 
lished from normal average constraints which are consistent with both the 
generalized H-theorem and the generalized Stosszahlansatz (molecular chaos 
hypothesis) [24]. Qualitative distinctions of the present treatment vis-d-vis 
previous studies have been briefly discussed. 

Ongoing work serves a two-fold objective: {i) the Pythagorean theorem for 
the dual generalized K-Ld derived herein has been employed to provide a 
deformed statistics information geometric description of Plefka's expansion in 
mean- field theory [38]. While details of this analysis are beyond the scope of 
this Letter, only a cursory overview of this analysis is presented herein. 

Extending the procedure followed in [38] to obtain the mean-field equations 
[39] , to the case of generalized statistics, a deformed statistics mean-field cri- 
terion is obtained in terms of minimizing a dual generalized K-Ld. This is 
accomplished by extrapolation of the information geometric arguments in [40] 
to the case of deformed statistics. Application of the Pythagorean theorem 
(40) results in a modified deformed statistics mean-field criterion, which when 
subjected to a perturbation expansion employing results of the g-deformed 
variational perturbation theory developed in [25], yields candidate deformed 
statistics mean-field equations; (ii) the results obtained in this Letter serve 
as the foundation to extend the sufficient dimensionality reduction model [41] 
to the case of deformed statistics. Results of these studies will be published 
elsewhere. 
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